209 research outputs found
Neural Aesthetic Image Reviewer
Recently, there is a rising interest in perceiving image aesthetics. The
existing works deal with image aesthetics as a classification or regression
problem. To extend the cognition from rating to reasoning, a deeper
understanding of aesthetics should be based on revealing why a high- or
low-aesthetic score should be assigned to an image. From such a point of view,
we propose a model referred to as Neural Aesthetic Image Reviewer, which can
not only give an aesthetic score for an image, but also generate a textual
description explaining why the image leads to a plausible rating score.
Specifically, we propose two multi-task architectures based on shared
aesthetically semantic layers and task-specific embedding layers at a high
level for performance improvement on different tasks. To facilitate researches
on this problem, we collect the AVA-Reviews dataset, which contains 52,118
images and 312,708 comments in total. Through multi-task learning, the proposed
models can rate aesthetic images as well as produce comments in an end-to-end
manner. It is confirmed that the proposed models outperform the baselines
according to the performance evaluation on the AVA-Reviews dataset. Moreover,
we demonstrate experimentally that our model can generate textual reviews
related to aesthetics, which are consistent with human perception.Comment: 8 pages, 13 figure
Predicting Token Impact Towards Efficient Vision Transformer
Token filtering to reduce irrelevant tokens prior to self-attention is a
straightforward way to enable efficient vision Transformer. This is the first
work to view token filtering from a feature selection perspective, where we
weigh the importance of a token according to how much it can change the loss
once masked. If the loss changes greatly after masking a token of interest, it
means that such a token has a significant impact on the final decision and is
thus relevant. Otherwise, the token is less important for the final decision,
so it can be filtered out. After applying the token filtering module
generalized from the whole training data, the token number fed to the
self-attention module can be obviously reduced in the inference phase, leading
to much fewer computations in all the subsequent self-attention layers. The
token filter can be realized using a very simple network, where we utilize
multi-layer perceptron. Except for the uniqueness of performing token filtering
only once from the very beginning prior to self-attention, the other core
feature making our method different from the other token filters lies in the
predictability of token impact from a feature selection point of view. The
experiments show that the proposed method provides an efficient way to approach
a light weighted model after optimized with a backbone by means of fine tune,
which is easy to be deployed in comparison with the existing methods based on
training from scratch.Comment: 10 page
TSViT: A Time Series Vision Transformer for Fault Diagnosis
Traditional fault diagnosis methods using Convolutional Neural Networks
(CNNs) face limitations in capturing temporal features (i.e., the variation of
vibration signals over time). To address this issue, this paper introduces a
novel model, the Time Series Vision Transformer (TSViT), specifically designed
for fault diagnosis. On one hand, TSViT model integrates a convolutional layer
to segment vibration signals and capture local features. On the other hand, it
employs a transformer encoder to learn long-term temporal information. The
experimental results with other methods on two distinct datasets validate the
effectiveness and generalizability of TSViT with a comparative analysis of its
hyperparameters' impact on model performance, computational complexity, and
overall parameter quantity. TSViT reaches average accuracies of 100% and 99.99%
on two test sets, correspondingly
Dynamic Circular Network-Based Federated Dual-View Learning for Multivariate Time Series Anomaly Detection
Multivariate time-series data exhibit intricate correlations in both temporal and spatial dimensions. However, existing network architectures often overlook dependencies in the spatial dimension and struggle to strike a balance between long-term and short-term patterns when extracting features from the data. Furthermore, industries within the business community are hesitant to share their raw data, which hinders anomaly prediction accuracy and detection performance. To address these challenges, the authors propose a dynamic circular network-based federated dual-view learning approach. Experimental results from four open-source datasets demonstrate that the method outperforms existing methods in terms of accuracy, recall, and F1_score for anomaly detection
- …